Protocol for predicting drug-resistant protein mutations to an ERK2 inhibitor using RESISTOR

Summary Prospective predictions of drug-resistant protein mutants could improve the design of therapeutics less prone to resistance. Here, we describe RESISTOR, an algorithm that uses structure- and sequence-based criteria to predict resistance mutations. We demonstrate the process of using RESISTOR to predict ERK2 mutants likely to arise in melanoma ablating the efficacy of the ERK1/2 inhibitor SCH779284. RESISTOR is included in the free and open-source computational protein design software OSPREY. For complete details on the use and execution of this protocol, please refer to Guerin et al..1


SUMMARY
Prospective predictions of drug-resistant protein mutants could improve the design of therapeutics less prone to resistance. Here, we describe RESISTOR, an algorithm that uses structure-and sequence-based criteria to predict resistance mutations. We demonstrate the process of using RESISTOR to predict ERK2 mutants likely to arise in melanoma ablating the efficacy of the ERK1/2 inhibitor SCH779284. RESISTOR is included in the free and open-source computational protein design software OSPREY. For complete details on the use and execution of this protocol, please refer to Guerin et al.. 1

Timing: 1 h
This section describes the minimal hardware and operating system requirements, where to obtain the requisite software and its installation procedure, and the file formats of the sequence and structural inputs required to run RESISTOR. For the purposes of demonstration, we use RESISTOR to predict resistance mutations on the ERK2 kinase to the inhibitor SCH772984 (hereafter referred to as SCH7). Previously, we have used RESISTOR to prospectively predict resistance mutations in EGFR and BRAF, which we then validated experimentally. 1 In addition, we have employed aspects of RESISTOR, including multistate K* design and mutational signature probabilities, in other applications, such as our development of algorithms like BBK* (Branch and Bound Over K*) 2 and our predictions of resistance-conferring mutations to inhibitors of kinases such as KIT, EGFR, ABL1, and ALK. 3 Here we offer abbreviated definitions and references for the terminology we use throughout this protocol. Positive design is the use of computational protein design algorithms to improve an objective, such as ligand binding. Negative design is the opposite, i.e., the goal is to make an objective worse, such as to ablate binding. RESISTOR uses multistate design, [4][5][6][7][8] or both positive and negative design in parallel, to mimic how mutations affect the competitive balance between a protein's endogenous ligand and a competitive inhibitor. Resistance can occur via a protein's increased activity with its endogenous ligand, decreased binding with an inhibitor, or a combination of these factors. 1,3,6,7 RESISTOR employs Pareto optimization over positive and negative design, mutational signature probabilities, and hotspot scores to rank prospective resistance mutants. The positive and negative design portions use the K* algorithm 9 implemented in OSPREY, 10 which generates low-energy molecular ensembles to compute the partition functions the algorithm uses to provably approximate binding affinity, K a . 4,9 Mutational signature probabilities are derived from data provided by Alexandrov et al. 11 and denote the probability a DNA base will mutate to another base in a given sequence context and cancer type. A hotspot score is the number of sequences with a mutation at a particular residue location which multistate design criteria predicts as structural resistance mutations. 1,3 Figure 1 contains a conceptual overview of the RESISTOR protocol's main phases.
Hardware and software RESISTOR requires a minimum of 32 GiB of RAM and 5 GiB of free hard disk space. You also will need to have a good text editor on your computer: vim, emacs, or any other text editor that can be used for editing ASCII characters will do; programs like Microsoft Word or LibreOffice Writer will not. Our demonstration of the protocol is on a Linux operating system, although with minor adjustments the process below could be carried out on Windows and macOS operating systems.

Installing the Software Dependencies
Timing: 20 min RESISTOR requires Java 17, Miniconda, AmberTools, Julia, and OSPREY. The first phase, Preparation, involves obtaining the positive and negative design structure files in PDB format, along with the corresponding cDNA sequence. The structures need to be prepared for OSPREY K* design. Finally, the structures and additional inputs (outlined below) must be collected into a YAML design file. The second phase, Execution, involves using the OSPREY K* algorithm to compute provable approximations to the binding constant, K a , and using RESISTOR to filter the results, assign mutational probabilities, and Pareto optimize. The final phase, Analysis, is where the user examines the RESISTOR-provided output of Pareto ranks and low-energy molecular ensembles. The details involved in each of these steps are explained comprehensively in this STAR Protocol.

Install Java 17
a. Download an archive for the latest version of Java 17 for your platform from https://jdk.java. net/archive/ b. Extract the archive to a location on your computer, e.g., $HOME/java/jdk-17.0.2. c. In your shell's profile, set the JAVA_HOME environment variable to the location you extracted the archive to, and add the java executable to your path, e.g., d. Verify java is available on your shell's path by opening a new terminal window, typing java -version, and hitting enter. You should see output like the following: Video S1 demonstrates this procedure on Fedora Linux. 8. Choose your cancer-type specific mutational probabilities JSON file. a. Identify the probabilities file you need. For this protocol, we will use the melanoma probabilities file, melanoma.json. b. Mark down the path to this file, which you will use in Assigning Pareto Ranks step 11.
Note: The RESISTOR directory within the OSPREY distribution (osprey-3.3/resistor) contains mutational probability files for melanoma, non-small cell lung cancer, stomach cancer and pancreatic cancer. It is also possible to create your own mutational probabilities file, which is covered elsewhere. 15 CRITICAL: Ensuring that the following prerequisites are met helps avoid downstream prediction problems: 1. When possible, use high-quality, high-resolution structures. While the cut-off for resolution is still a matter of discussion in the scientific community, previous successful designs have used X-Ray diffraction resolutions ranging between 1.4 and 3.15 Å . 6,7,[16][17][18] We have also had success with cryo-EM resolutions between 3.4 and 11.5 Å . For NMR structures, we recommend that the structure determination use RDCs; 2. Check that the residue numbers and amino acid types in the positive and negative protein structures are the same, e.g., ALA 10 in the structure for the positive design and ALA 10 in the structure for the negative design refer to the same residue; and 3. that the cDNA sequence translates to the amino sequence in the structure files, i.e., they represent the same genetic variant. Furthermore, the FASTA file must begin with the codon that translates to residue number 1. In this example, PDB ID 4qta (ERK2:SCH7) has a resolution of 1.45 Å , PDB ID 2y9q (ERK2:AMP-PNP) has a resolution of 1.55 Å , and the residue numbering in the two structures are the same and correspond to the canonical numbering also used in the FASTA file. See Video S6 for a demonstration of carrying out these checks.
If your checks indicate discrepancies exist, you will need to manipulate the files to resolve them. As the PDB and FASTA file formats are standard in the fields of structural biology and bioinformatics there are many tools available for their manipulation, including Maestro 19 for ll OPEN ACCESS STAR Protocols 4, 102170, June 16, 2023 manipulating structural information. Yet as both file formats are defined in human-readable ASCII text, oftentimes the simplest way to make any necessary tweaks in the files is with a standard text editor, such as emacs or vim.
In cases where empirical structures are not available, it is possible to use docking, homology modeling, or other computational modeling techniques to generate structures. 1,6,7,17,20 For example, computational tools such as Modeller 21 or Alphafold 22 could be used to predict an initial protein structure, and docking tools such as AutoDock Vina 23 or those included in Maestro 19 could be used to dock the positive and negative design ligands. 1,3 With the recent explosion of available structural models, such as the Alphafold Protein Structure Database, 24 it may even be the case that a computationally predicted starting structure already exists. When taking such an approach, it is critical to have high confidence in the accuracy of any computationally generated structures as RESISTOR is very sensitive to variation in structural input.

STEP-BY-STEP METHOD DETAILS
Here we describe the step-by-step details of how to use RESISTOR. These steps include how to 1) specify the K* positive and negative designs; 2) run OSPREY to compute each mutant's positive and negative K* scores; 3) process the data to assign mutational probabilities and hotspot scores; and, 4) assign Pareto ranks to each prospective mutant. As a demonstration case, we use RESISTOR to predict ERK2 mutants likely to arise in melanoma that may ablate the efficacy of the ERK1/2 inhibitor SCH7.

Specifying the K* positive and negative designs
Timing: 1 h In this step, we create the YAML files that are used to specify the input for the positive and negative K* designs. Positive design refers to improving the interaction between a protein and its endogenous ligand, which in this context is ERK2 with ATP. Negative design refers to ablating the binding between a protein and its targeting inhibitor, here ERK2 and SCH7. By this point, we assume you have completed the steps in the section before you begin, including having downloaded the PDB structure files 4qta.pdb and 2y9q.pdb, and the FASTA-formatted cDNA sequence file ENST00000215832. 10 b. Run pdb4amber on the two ERK2 structures to add any missing atoms: Note: pdb4amber renumbers the residues in the input structures, starting from 1. We would like to keep our canonical residue numbering, and luckily pdb4amber outputs a mapping file from the original numbers to the new numbers it assigned the residues. This file is titled the name of the input file for pdb4amber, suffixed with _renum.txt, e.g., 2y9q.p4a_renum.txt. Within the OSPREY distribution there's a program called p4a-undo.py (found in the osprey3.3/resistor directory) which re-assigns the original numbering and chain identifiers.
c. Using the same AmberTools22 conda environment, run p4a-undo.py with each of the two output structures from pdb4amber: d. Add hydrogens to the AMP-PNP and SCH7 structures using a molecular modeling program such as Maestro.
Note: Epik 27 in Maestro 19 is quite good at correctly predicting pK a and protonation states for small molecules. You will also need to compute the net charge of the small molecules for step 3, which Epik and Maestro provide. SCH7's net charge is +1, whereas AMP-PNP has a net charge of -4. c. Run gen-templ-coords.sh once for each of the ligands, using Unix pipe redirection to save the output. gen-templ-coords.sh expects as input the path of the ligand structure and the threeletter residue name of the ligand used in the structure: 5. Generate rotamers for AMP-PNP and SCH7 a. To allow the ligands to translate, rotate, and flex slightly, we define the flexible dihedrals for the ligands. b. Determine the molecule-specific dihedrals using Maestro or other molecular visualization software. Figure 2 demonstrates determining the dihedrals in Maestro. c. Create a text file listing the dihedrals. The format of the file, and the rotamer specification for AMP-PNP is shown in Figure 3. d. Save the file as amppnp.rot. e. Repeat steps a-d for SCH7, saving that file as sch7.rot. coordinates key. iv. Copy the contents of sch7.prepi as the value for the ligand.extra_templates key. v. Copy the contents of sch7.rot as the value for the ligand.extra_rotamers key. c. To ensure that your YAML file is in the correct format for OSPREY, use the affinity command's --verify-design flag to check the design file:

Protocol
The output of the command should look like: See Video S7 for a demonstration of how to do this step and see troubleshooting 2 if your output is different.
8. Choose residues to mutate and create mutational scan designs. a. Taking the files you created in steps 6 and 7, add a YAML list of objects representing these mutants as the value of the scan.residues key, as is shown in Figure 4.
b. In each of the files generated in steps 6 and 7, set the ligand as flexible by adding it to the ligand.residue_configurations key in the YAML file. Figure 5 shows how this is set in the erk2-amppnp.yaml and erk2-sch7.yaml files. c. After adding these fields, again verify the syntax of the design files is correct: 9. Generate the K* affinity designs for each of the point mutants. Note: The --do-scan flag instructs OSPREY to generate a K* affinity design centered on each of the residues specified in the scan.residues key. These K* affinity designs each include a single mutable residue, which is set to mutate to all the other amino acids, and a flexible shell around the mutating residue. The optional --scan-flex-distance parameter denotes the radius of the OSPREY-generated flexible shell centered on the design's mutable residue. It defaults to 2 Å .
b. Verify that a positive and negative YAML design specification is created for each of the 13 residues of interest set in step 8a. The naming format of these files is {original-name}. {residue}.yaml, e.g., erk2-sch7.A36.yaml. There should be a total of 26 newly created files.

Running the K* predictions
Timing: 1 day to 1 week  The value for the key is a list of objects representing residues in the structure. Each object has a chain key denoting the chain identifier in the structure, a res_num key denoting the residue number, and the aa_type key with the 3-letter amino acid code. In the example above, we specify that Y26 and A52 in chain A of the structure should be included in the scan. Below the ellipsis we would also include objects for I56, R67, E71, Q105, D106, L107, M108, D111, K114, L156, and C166. where {design-file} is the path to one of the design files you generated in step 9, and {frcmod-file} is the path to the ligand-specific forcefield modification file you generated in step 3b, e.g.,: Note: There are optional flags you can pass to the affinity command that could be helpful for your predictions. These flags include --save-confs, --ensemble-dir, and --cuda .
--save-confs takes an integer argument and denotes the number of low-energy conformations from the K* molecular ensemble that OSPREY should save of each sequence. It defaults to not outputting structures; if you want structures add this argument and specify a number greater than 0. --ensemble-dir takes a path as an argument and indicates where structures should be saved. And if you have access to CUDA-enabled Nvidia GPUs, you may find that the --cuda flag substantially decreases the amount of time needed to run your designs.
c. Execute the following command to print the per-residue type K* predictions to the terminal screen. If you also want to save the output (both standard out and standard error) to files, you can use Unix pipes to pipe the output to the tee program: Note: There is an optional parameter, --epsilon, which takes a double value as an argument and defaults to 0.683 (see the supplemental information of Ojewole et al. for justification for this default). 2 Note: --epsilon must be between 0 and 1; values closer to 0 indicate a more accurate partition calculation and are thus more computationally expensive, and vice-versa. We recommend initially running K* affinity designs with an epsilon close to 1, such as 0.9999, and then gradually decreasing epsilon to obtain increasingly accurate K* scores while still completing in a reasonable amount of time.

Assign Pareto Ranks
Timing: 1 h The purpose of this step is to compile and annotate the positive and negative K* mutant predictions with mutational signature probabilities and hotspot scores. In addition, we run the resistor program to compute the cutoff, c, from the K* predictions, and filter mutants that K* predicts not to be resistance mutants, or whose mutational probability is 0, and assign Pareto ranks.
Note: Your positive and negative K* predictions from step 10 should be complete prior to beginning this step.
11. Compile the K* predictions. a. Copy the template CSV file included in the OSPREY distribution (osprey3.3/resistor/ resistor.csv) to erk2-resistor.csv. b. Using the output files from the predictions in step 10, which contain the log 10 K* scores for each of the sequences you evaluated at a particular residue location, fill out the following columns.
i. wild-type residue should have the 3-letter amino acid code for the wild-type residue at residue number. ii. residue number should have the residue number of the residue. iii. mutant residue should have the 3-letter amino acid code for the mutant residue RESISTOR is evaluating. iv. wild-type K* (positive) should have the log 10 K* score computed on the ERK2:AMP-PNP structure for residue number. v. mutant K* (positive) should have the log 10 K* score computed on the ERK2:AMP-PNP structure for residue number when mutant residue is substituted for wild-type residue. vi. wild-type K* (negative) should have the log 10 K* score computed on the ERK2:SCH7 structure for residue number. vii. mutant K* (positive) should have the log 10 K* score computed on the ERK2:SCH772894 structure for residue number when mutant residue is substituted for wild-type residue. c. Complete a new row for each mutant sequence you evaluated in step 10. Each positive/ negative design pair from step 10 evaluated 21 different residue types in each location, meaning we must complete 21 rows for each residue. See Figure 6 for an example of a partially completed worksheet. 12. Run the resistor program to assign mutational signature probabilities, filter predicted benign mutations, and assign Pareto ranks. a. Open a terminal and change into the osprey3-3/resistor directory. b. Download the required Julia dependencies: i. Start the Julia interpreter with the following command: ii. Activate Julia's package manager by hitting the ']' key.
iii. Type instantiate and wait while the package manager downloads the dependencies. iv. Exist the interpreter by entering CTRL-D or typing exit() and hitting enter. c. Run the program to assign the mutational probabilities and cDNA codons to each mutant sequence: where {mut-prob-file} is the path to the mutational probabilities file, {fasta-file} is the path to the cDNA file, {id} is the identifier of the sequence in {fasta-file}, {csv-file} is the path to the CSV file you created in step 11, and {pareto-config} is the path to the default Pareto optimization configuration JSON, e.g.,: This command. i. Fills out the signature probability and codon columns.
ii. Filters rows whose mutant K* (positive) is less than 0, as this indicates the loss of function with the endogenous ligand. 3 iii. Filters rows whose signature probability is 0 (indicating that the mutant can only occur with 3 base changes). Filters mutants whose ratio of positive to negative K* scores is below the cut-off. vi. Fills out the hotspot count column by counting how many resistance mutations remain at each position after the filtering in the prior steps. vii. Fills out the rank column by running Pareto optimization over the mutant K* (positive), mutant K* (negative), signature probability, and hotspot count columns.
It outputs the completed table to standard out. You can redirect it to a file using I/O redirection in Linux or by piping the output to the tee command. Figure 7 provides an example of the output file.
Note: The Pareto JSON specification file is described in the README.md. By default, RESISTOR optimizes over mutational signature probability, the positive and negative K* scores, and the hotspot score. We've provided a template Pareto JSON specification file in the resistor directory, pareto-config.json, which specifies to optimize by maximizing a mutant's signature probability, positive design K* score, and hotspot score, and minimizing the mutant's negative design K* score. If you had other criteria to optimize over you could add these to this Pareto JSON specification file.
Note: There are two additional optional flags to the command above that may be helpful in some circumstances. These flags are --debug and -c0. The -debug flag prints out intermediary CSV files after each filtering and computational step. It also prints the computed cut-off c to standard error. The --c0 flag allows you specify a different value for c 0 , for more information as to what this value is see Guerin et al. 1

EXPECTED OUTCOMES
RESISTOR provides a protocol for ranking potential resistance mutations to existing and prospective therapeutics. In an earlier publication, 1 we used RESISTOR to successfully predict resistance mutations in BRAF and EGFR. In this example, we have applied RESISTOR to predicting resistance mutations to SCH7, an ERK1/2 inhibitor.
As an outcome, the predicted resistance mutations, as well as their Pareto ranks, are contained in the file output in step 12. With that file, one can analyze the predicted change in a mutant's positive K* score and negative K* score, meaning that RESISTOR produces not only binary predictions of a mutant's resistance or sensitivity profile but also whether a mutation is resistant because of increased binding to the endogenous ligand, decreased binding to the therapeutic, or a combination of the two factors. It also uses a specific cancer type's mutational signature to predict how likely it is that a putative resistance mutation will occur in a specific cancer patient population. Additionally, as mentioned in step 10, RESISTOR's use of OSPREY's K* algorithm allows us to output molecular ensembles of low energy conformations for structural analysis. See Figure 8 for an example of the OSPREY-generated low-energy structural ensemble.

LIMITATIONS
In the example we provided above for ERK2 and SCH7, we investigated only potential resistance mutations occurring within the binding pocket of the ligands. Modeling allosteric pathways to resistance, for example mutations distant from the binding pocket on the opposite side of ERK2 causing large-scale conformational rearrangement, while a goal of OSPREY, is not something we've yet incorporated into RESISTOR. Additionally, RESISTOR does not model resistance caused by phenomena such as splice variants, amplification, or mutations in related genes, which have been shown to be important in N-RAS, MEK1, MEK2, and other genes. 28 Additional modeling to incorporate these causes of resistance is left to future work.

Potential solution
There are different potential causes for this problem. If instead of help text you see the following printed out: then you have not correctly installed and configured Java 17 as detailed in before you begin, step 1. Redo this step and try again. If the message is: Figure 8. OSPREY-generated structural ensembles of ERK2 E71K Top: ERK2 E71K with SCH7. R70 and 71K are labeled, and SCH7 is in green. Bottom: ERK2 E71K with AMP-PNP. R70 and 71K are labeled, and AMP-PNP is green and purple. According to Brennen et al., 21 the E71K mutation grants ERK2 resistance to SCH7. RESISTOR correctly predicts this resistance mutation and ranks it in top Pareto rank. These two OSPREY-generated, low energy structural ensemble files are included in Data S2.
> osprey affinity --help ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. Please set the JAVA_HOME variable in your environment to match the location of your Java installation.
> osprey affinity --help Error occurred during initialization of boot layer java.lang.module.FindException: Module jdk.incubator.foreign not found ll OPEN ACCESS then it is possible that you are using a version of java that is newer than Java 17. At the current time only Java 17 is supported. It is often the case that there are multiple versions of Java installed in an operating system, and the default version in your operating system may not be Java 17. You can confirm that you are running the correct version of Java for OSPREY by running the command.
Below is a demonstration of the output of that command showing the incorrect version of Java: The remedy in this case is to ensure that you have downloaded and configured Java 17, as detailed in before you begin, step 1.

Problem 2
When using the --verify-design option to the affinity command, you see output indicating that indicates a residue was deleted for not having a matching template (related to steps 6 and 7).

Potential solution
Note which residue the command says it deleted. The output tells you the atoms that it expects to find. Open the YAML file and look at that corresponding residue in either the protein or ligand coordinates. Identify the missing atoms and add them into the structure using a molecular visualization program such as Maestro. If just the labeling is off, fix the labeling. See Video S8 for a demonstration of how to do this.

Problem 3
When using the --verify-design option to the affinity command, you see output indicating that the residue does not exist (related to steps 6 and 7).

Potential solution
Look at the coordinates section of the YAML file for the residue mentioned in the error message. Ensure the residue's number and amino acid identifier matches that used in the scan. See Video S9 for a demonstration of this issue and resolution steps.

Problem 4
OSPREY fails to parse the design file YAML specification (related to steps 6 and 7).

Potential solution
Use a YAML validator, such as yamllint, which can indicate on which line the YAML syntax is broken.
Assuming you have installed yamllint as described in step 3 of Installing the Software Dependencies, default invocation of yamllint would look as follows:

> yamllint {design-file}
Any errors will be identified with a description of the problem and the line number. Address them as appropriate. Additionally, the official YAML specification 29 is a good resource for learning how YAML documents are written and parsed.

Problem 5
The osprey affinity command begins to run but after some time fail to complete with an error (related to running the k* predictions).

Potential solution
The most common reason osprey affinity fails is that the design has run out of memory. The error output might look like this: The important thing to remember is that error stack traces are read from the bottom to the top. Scroll to the bottom of the error and if you see a message that looks like: The affinity command failed because it ran out of memory. There are two potential solutions to try. The first is to increase the amount of memory allocated to OSPREY, if possible. This is defined in the JAVA_OPTS environment variable, e.g., to allocate 720 gigabytes to the Java heap, use: For designs where you run out of memory, the first attempt should be to try to make more memory available to OSPREY. If that is not possible, then the second potential solution is to reduce the number of flexible residues in your design. Oftentimes removing one or two flexible residues will allow a previously difficult design to finish. This should only be done when necessary, as removing flexible residue can reduce the accuracy of the predictions.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Bruce R. Donald (brd+sp22@cs.duke.edu).

Materials availability
This study did not generate new unique reagents.

Data and code availability
The code and datasets used in this study is available at https://github.com/donaldlab/OSPREY3/ releases/3.